Fix high response latency with multiple API-requests targeting non-existent hosts or services#10883
Conversation
f83e8c7 to
c14e5de
Compare
83f42f4 to
1c40368
Compare
julianbrost
left a comment
There was a problem hiding this comment.
You have a commit titled "Consistently use std::exception_ptr", but I noticed two (indirect) uses of boost::exception_ptr through calls to boost::current_exception() in the code base.
icinga2/lib/config/expression.cpp
Lines 61 to 64 in 1c40368
That one is probably fine given that boost::errinfo_nested_exception lives within the boost exception universe even though the documentation leaves the exact exception pointer type unspecified.
However, there's a second instance where we can pretty sure get rid of the implicit boost::exception_ptr:
icinga2/lib/cli/consolecommand.cpp
Lines 439 to 442 in 1c40368
Not sure why this isn't just throw;.
Apart from that, this is looking fine. Spun it up for a test, the speedup is amazing. This is so stupid overall 🤣
As I noted in the PR description
No idea. I'll add it in a separate commit. |
1c40368 to
f2b971e
Compare
This was likely meant as a fallback for non-`std::exception` exception pointers, but it would never have been reached, because `(std|boost)::rethrow_exception()` would have thrown those out of the function scope. This fixes the fallback by making it a catch handler and also using the more direct way of getting this fallback diagnostic info inside a catch handler.
`DiagnosticInformation()` wasn't able to take a `std::exception_ptr` due to the missing conversion on older boost versions, so now everything uses the std::exception_ptr instead. There are still a few reasons to use `boost::exception` in some places, but for exception pointers, the standard one should be better in most cases and almost never requires to include an extra header.
The `diagnostic_information` string will now generated if and when it is required (i.e. the "verbose" parameter is true).
f2b971e to
171e9ee
Compare
|
Backport failed for Please cherry-pick the changes locally and resolve any conflicts. git fetch origin support/2.16
git worktree add -d .worktree/backport-10883-to-support/2.16 origin/support/2.16
cd .worktree/backport-10883-to-support/2.16
git switch --create backport-10883-to-support/2.16
git cherry-pick -x aa6c63b52eed9efcef03deb0193e574f3cb24e62 a9783fdde45a73fda40cc4c19fde3a64de6edae2 393724bb2a1c63e718a2f06b369f102258c9b98b 171e9ee37a1db3b13843f9001b162a420dd21d4a |
This fixes the high response latency observed in #10875 when multiple requests target non-existent hosts in parallel. The reason is that for almost all API endpoints that target config objects diagnostic information including a stack trace is generated, regardless if it is actually returned (which only happens when the "verbose" parameter is given). The solution is to pass the
exception_ptrintoSendJsonError()and only generate theDiagnosticInformationfrom it when needed.Solution
SendJsonError()now takes astd::exception_ptrargument instead of adagnosticInformationstring that is generated at the call site. This is cheap and doesn't cause any copies. Then, where the actual decision is made to include thediagnostic_information(i.e. dependent on the "verbose" parameter),DiagnosticInformation(<eptr>)is run. The overload already existed, so there wasn't even a need to rethrow the exception and process the string from the result.This is a clean solution, and the change makes sense even without the latency amplification in mind, since pointless work is avoided in any case.
However, some yak-shaving was required to feed
SendJsonError()with an exception pointer gathered withstd::current_exception()at the call-site. I had to make the entire code-base usestd::exception_ptrinstead ofboost::exception_ptr. On the newest boost version, this is just a typedef, on previous versions the boost one could be constructed from the std one, but there's also old versions we still support which didn't have that conversion. And that refactor in turn uncovered a small bug in theDiagnosticInformation(<eptr>)which had a dead-code fallback that then failed to compile and is now fixed.Testing
Using the test-script that was provided in #10875, the improvement is clearly observable:
Before
With existent host/service:
With non-existent host/service:
After
With existent host/service:
With non-existent host/service:
You can see that the error responses for non-existant targets are now actually much faster than successful requests, as one would expect.
Fixes #10875.